Week 06: Cool Type Checking & Runtime Organization

Cool Type Checking

Static vs. Dynamic Typing

Static type systems detect common errors at compile time.
But some correct programs are disallowed.
- Some argue for dynamic type checking instead.
- Others want more expressive static type checking.
Def.
- The dynamic type of an object is the class C that is used in the new C expression that created it at run-time.
- The static type of an expression captures all dynamic types the expression could have at compile-time.

Soundness Theorem

Target
- For all expressions $E$, dynamic_type(E) = static_type(E)
- Sample
  - class B inherits A;
  - x:A <- new B;
  - Static type of x: x:A
  - Dynamic type of x: new B
Soundness theorem for the Cool type system: $$\forall E,\ dynamic\_type(E) \leq static\_type(E)$$
Subclasses only add attributes or methods
Methods can be redefined but with same type!

Self Type

Scenario (Why we need to use SELF_TYPE)

class Count {
    i : int <- 0;
    inc () : Count { {
        i <- i + 1;
        self;
      }};
};
class Stock inherits Count { name : String; };
class Main { Stock a <- (new Stock).inc(); ... a.name ... };

Because inc() will return Count.
- Use inc():SELF_TYPE instead.
The type checker can now prove
- $O,M,C\vdash (new\ Count).inc():Count$
- $O,M,C\vdash (new\ Stock).inc():Stock$
Remark: SELF_TYPE is not a dynamic type. It is a static type. In cgen, it will fetch the class tag of the instance.

In general, if SELF_TYPE appears textually in the class $C$ as the declared type of $E$ then $dynamic\_type(E) \leq C$

The meaning of SELF_TYPE depends on where it appears

$SELF\_TYPE_C$ to refers to an occurrence of $SELF\_TYPE$ in the body of $C$
$SELF\_TYPE_C \leq C$

Type Operation

$T_1 \leq T_2$: $T_1$ is a subtype of $T_2$
$lub(T_1, T_2)$: the least-upper bound of $T_1$ and $T_2$

Let $T$ and $T’$ be any types but SELF_TYPE. Theorems between two types:

$SELF\_TYPE_C \leq SELF\_TYPE_C$
- $lub(SELF\_TYPE_C, SELF\_TYPE_C) = SELF\_TYPE_C$
$SELF\_TYPE_C \leq T$, if $C\leq T$
- $SELF\_TYPE_C$ can be any subtype of $C$
- Include $C$ itself.
- $lub(SELF\_TYPE_C, T) = lub(C, T)$
$T \leq SELF\_TYPE_C$ always false.
- Special case: $SELF\_TYPE_C$ is a path and $T$ is the leaf of the path.
- $lub(SELF\_TYPE_C, T) = lub(C, T)$
$T \leq T'$
- $lub(T, T’)$

Self Type Usage

SELF_TYPE is not allowed everywhere a type can appear.

class T inherits T’ {...}: T and T' cannot be SELF_TYPE
x:T: T can be SELF_TYPE
let x: T in E: T can be SELF_TYPE
new T: T can be SELF_TYPE
m@T(E_1, ..., E_n): T cannot be SELF_TYPE
m(x : T) : T’ { ... }: Only T’ can be SELF_TYPE

Self Type Checking

The meaning of SELF_TYPE depends on the enclosing class $C$ $(O,M,C\vdash e:T)$.

We use lub to design type rules using SELF_TYPE

[Dispatch] $$\frac{O,M,C\vdash e_0:T_0\\ O,M,C\vdash e_1:T_1\\ \dots\\ O,M,C\vdash e_n:T_n\\ M(T_0, f)=(T'_{1},\dots,T'_{n'},T'_{n+1})\\ T'_{n+1} \neq SELF\_TYPE\\ T_i\leq T'_{i},\ for\ 1\leq i\leq n}{O,M,C\vdash e_0.f(e_1,\dots,e_n):T'_{n+1}}$$
[Static Dispatch] $$\frac{O,M,C\vdash e_0:T_0\\ O,M,C\vdash e_1:T_1\\ \dots\\ O,M,C\vdash e_n:T_n\\ M(T_0, f)=(T'_{1},\dots,T'_{n'},T'_{n+1})\\ T'_{n+1} \neq SELF\_TYPE\\ T_i\leq T'_{i},\ for\ 1\leq i\leq n}{O,M,C\vdash e_0@T.f(e_1,\dots,e_n):T'_{n+1}}$$

New rules using SELF_TYPE

$$\frac{}{O,M,C\vdash self: SELF\_TYPE_C} $$
$$\frac{}{O,M,C\vdash new\ SELF\_TYPE: SELF\_TYPE_C} $$

Error Recovery in Semantic

Problem

What type is assigned to an expression with no legitimate type?
This type will influence the typing of the enclosing expression.

Solution

Assign type Object to ill-typed expressions.
Or assign a new type No_type for use with ill-typed expressions
- No_type $\leq$ C, for all types $C$
- Every operation is defined for No_type
- The class hierarchy is not a tree anymore.
The Object solution is fine in the course project.

Runtime Organization

Introduction

Goal

Management of Run-time Resources
Correspondence between static (compile-time) and dynamic (run-time) structures
Storage organization

When a program is invoked

The OS allocates space for the program
The code is loaded into part of the space
The OS jumps to the entry point (i.e., “main”)

Activations

Goal

Correctness of the program
Speed of the program

Assumption

Execution is sequential; control moves from one point in a program to another in a well-defined order.
When a procedure is called, control always returns to the point immediately after the call.

Activation

An invocation of procedure $P$ is an activation of $P$.
The lifetime of an activation of $P$ is
- (Run-time) All the steps to execute $P$
- (Nested) Including all the steps in procedures $P$ calls
The lifetime of a variable x is the portion of execution in which x is defined.
- Lifetime is a dynamic (run-time) concept.
- Scope is a static concept.
Lifetimes of procedure activations are properly nested.
- Use a stack to track currently active procedures.

Activation Records

The information needed to manage one procedure activation is called an activation record (AR) or frame.

Scenario

A procedure $F$ calls $G$, then $G$'s activation record contains a mix of info about $F$ and $G$. $F$ is suspended until $G$ completes, at which point $F$ resumes.

$G$’s AR contains

Complete execution of $G$
Resume execution of $F$
Space for $G$'s return value
Actual parameters
Pointer to the previous activation record (control link: points to AR of caller of $G$)
Machine status prior to calling $G$
- Contents of registers & program counter
- Local variables
Other temporary values

AR for function f

(Low)
Code
Static Data
Main
f: result
f: argument
f: control link
f: return address
Empty
Heap
(High)

Globals & Heaps

Globals are assigned a fixed address once.

Memory Management

Code Segment: Object code
- For many languages, fixed size and read only
Static: data (not code) with fixed addresses (e.g., global data)
- Fixed size, may be readable or writable
Stack: an AR for each currently active procedure
- Each AR usually fixed size, contains locals
Heap: all other data
- In C, heap is managed by malloc and free.

Alignment

Data is word aligned if it begins at a word boundary.

To align the 5-ch string, add 3 padding characters to the string.

Stack Machines

Goal

Use the stack to handle operating.

An instruction $r = F(a_1,\dots,a_n)$

Pops n operands from the stack
Computes the operation $F$ using the operands
Pushes the result r on the stack

Location of the operands/result always on the top of the stack.

n-register Stack Machine: keep the top n locations of the pure stack machine's stack in registers

1-register stack machine: The register is called the accumulator

Add: acc = acc + top_of_stack

Consider an expression $op(e_1,...,e_n)$ ($e_1,...,e_n$ are subexpressions)

For each $e_i\ (0 < i < n)$
- Compute $e_i$
- Push result on the stack
pop n-1 values from the stack, compute op
Store result in the accumulator